-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor PSQL incident store into ORM (SQLAlchemy) #211
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions but looks way nicer/more maintainable 👍
kai/service/incident_store/psql.py
Outdated
def __init__(self, args: KaiConfigIncidentStorePostgreSQLArgs): | ||
self.emb_provider = EmbeddingNone() | ||
application_name: Mapped[str] = mapped_column(primary_key=True) | ||
generated_at: Mapped[datetime.datetime] = mapped_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably don't want to use this as a primary key, since it will be unique if you insert the same report twice. I think ideally if we picked up the same report twice we would not duplicate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought process was that this was just a record of all reports in the order they were processed. What are you thinking as the primary key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is accurate, but we also don't want to duplicate the reports coming in. I think ideally we can pull an identifier or combination of identifiers off the Report itself. I say this mostly because the Hub importer polls the API, but if it restarts it could suck up all the previous reports again, so we just want to make sure those only go in once (or we could see like a crashloop but successfully load 50 reports
where the database just balloons in size). We could do that filter elsewhere as well if needed but if there's an identifier on the report that can be used I think that would be ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, that makes sense. How about storing the commit that generated the report instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think that could be part of an identifier but we could feasibly get multiple reports against the same commit. I'd be cool adding an ID field to the report object and requiring us to pass it in. I think we'll have report IDs on the konveyor side so we'd just need to figure out how to populate it in the on-disk scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok solid, I think I see what you're saying now
webapp["model_provider"] = ModelProvider(config.models) | ||
KAI_LOG.info(f"Selected provider: {config.models.provider}") | ||
KAI_LOG.info(f"Selected model: {webapp['model_provider'].model_id}") | ||
|
||
webapp["incident_store"] = IncidentStore.from_config( | ||
config.incident_store, webapp["model_provider"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing the model provider to the incident store so we can do LLM summaries next
c26e2d7
to
e77afcc
Compare
Signed-off-by: Jonah Sussman <sussmanjonah@gmail.com>
Signed-off-by: Jonah Sussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
return f"SQLIncident(violation_name={self.violation_name}, ruleset_name={self.ruleset_name}, application_name={self.application_name}, incident_uri={self.incident_uri}, incident_snip={self.incident_snip:.10}, incident_line={self.incident_line}, incident_variables={self.incident_variables}, solution_id={self.solution_id})" | ||
|
||
|
||
# def dump(sql, *multiparams, **params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop this I assume
Signed-off-by: JonahSussman <sussmanjonah@gmail.com>
You will have to purge the
data
volume, delete all containers using it.podman rm $(podman ps -a --filter volume=data -q) podman volume rm data
Should do the trick.